In this Take-home Exercise, I will explore to reveal the daily routine of two selected participants of the city of Engagement, Ohio USA.
Challenge 2: Patterns of Life considers the patterns of daily life throughout the city. You will describe the daily routines for some representative people, characterize the travel patterns to identify potential bottlenecks or hazards, and examine how these patterns change over time and seasons.
In Challenge 2, you will use visual analytic techniques to address these questions:
In this take-home exercise, ViSiElse and other appropriate visual analytics methods are used to reveal the life patterns of the city of Engagement, Ohio USA while addressing the questions stated in the Task section.
The data are processed by using appropriate tidyverse family of packages.
The picture below shows a sketch of the initial design proposed.
Before we get started, it is important for us to ensure that the required R packages have been installed. If yes, we will load the R packages. If they have yet to be installed, we will install the R packages and load them onto R environment.
The chunk code below will do the trick.
packages = c('tidyverse', 'data.table', 'knitr', 'ViSiElse', 'kableExtra', 'lubridate', 'ggplot2', 'ggiraph', 'plotly', 'DT', 'patchwork','crosstalk')
for(p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
The code chunk below imports ActivityLogs files from the
rawdata folder into R by using list.files(),
read_csv()
of readr
package, and map_df() of purrr
package, and and save it as a tibble data frame called
logs.
memory.limit(size=160000)
logs <- list.files(path = "./rawdata/ActivityLogs/",
pattern = "*.csv",
full.names = T) %>%
map_df(~read_csv(.,
col_types = cols(.default = "c")))
During our data exploration, we realized some of the participants starts off with unstable financial status while some having stable financial status. As such, we will choose 1 participant each, with partipantId 4 from the unstable financial status group and partipantId 7 from the stable financial status group, and to look after their differing routines.
The code chunk below limits our logs data to the only
selected participants using filter() function.
logs_two_p <- logs %>%
filter(participantId == '4' | participantId == '7')
Data frame logs_two_p is saved in RDS format to avoid uploading large files to Git.
saveRDS(logs_two_p, 'data/logs_two_p.rds')
logs_two_p <- readRDS('data/logs_two_p.rds')
We will use kable() to review the structure of the data
frame and kable_classic of kableExtra has
been applied to ensure the full width of the table can be preserved.
kable(head(logs_two_p), format = "html", table.attr = "style='width:40%;'" ) %>%
kable_classic(full_width = T, position = "center", )
| timestamp | currentLocation | participantId | currentMode | hungerStatus | sleepStatus | apartmentId | availableBalance | jobId | financialStatus | dailyFoodBudget | weeklyExtraBudget |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2022-03-01T00:00:00Z | POINT (976.2409614204214 4574.575079082071) | 4 | AtHome | JustAte | Sleeping | 194 | -681.6505879983722 | 177 | Unstable | 20 | 0 |
| 2022-03-01T00:00:00Z | POINT (-1023.8165705255449 1578.3713681439597) | 7 | AtHome | JustAte | Sleeping | 97 | 683.8632217708105 | 270 | Stable | 20 | 1164.0690586556877 |
| 2022-03-01T00:05:00Z | POINT (976.2409614204214 4574.575079082071) | 4 | AtHome | JustAte | Sleeping | 194 | -681.6505879983722 | 177 | Unstable | 20 | 0 |
| 2022-03-01T00:05:00Z | POINT (-1023.8165705255449 1578.3713681439597) | 7 | AtHome | JustAte | Sleeping | 97 | 683.8632217708105 | 270 | Stable | 20 | 1164.0690586556877 |
| 2022-03-01T00:10:00Z | POINT (976.2409614204214 4574.575079082071) | 4 | AtHome | JustAte | Sleeping | 194 | -681.6505879983722 | 177 | Unstable | 20 | 0 |
| 2022-03-01T00:10:00Z | POINT (-1023.8165705255449 1578.3713681439597) | 7 | AtHome | JustAte | Sleeping | 97 | 683.8632217708105 | 270 | Stable | 20 | 1164.0690586556877 |
We will add date and year_month columns for further analysis.
Let’s look at the different activities from currentMode:
[1] "AtHome" "Transport" "AtWork" "AtRestaurant"
[5] "AtRecreation"
We will now calculate the duration of each activity in seconds and
transform data using pivot_wider() of
tidyr to suit the usage for visielse()
visuals.
logs_daily_ss <- logs_two_p %>%
group_by(date, participantId, currentMode) %>%
summarise(count = n()) %>%
mutate(ss = count * 5 *60) %>%
ungroup() %>%
select(date, participantId, currentMode, ss) %>%
pivot_wider(names_from=currentMode, values_from=ss,
values_fill=0)
| date | participantId | AtHome | AtRestaurant | AtWork | Transport | AtRecreation |
|---|---|---|---|---|---|---|
| 2022-03-01 | 4 | 54600 | 1200 | 28800 | 1800 | 0 |
| 2022-03-01 | 7 | 28200 | 2400 | 29100 | 21300 | 5400 |
| 2022-03-02 | 4 | 53700 | 1200 | 29100 | 2400 | 0 |
| 2022-03-02 | 7 | 27600 | 1200 | 29100 | 17100 | 11400 |
| 2022-03-03 | 4 | 33000 | 1200 | 29700 | 12300 | 10200 |
| 2022-03-03 | 7 | 27600 | 2400 | 29100 | 22500 | 4800 |
Before plotting, we will set an order to the activities.
logs_daily_ss <- logs_daily_ss[,c(1,2,5,4,3,6)]
visielse(logs_daily_ss, group = logs_daily_ss$participantId, informer = NULL, method = "cut", pixel = 1000)
-parameters
method : cut
grwithin :
quantity : N
informer :
tests : FALSE
threshold.test : 0.01
pixel : 1000
t_0 : 0
-MATp : 10 x 89 sparse Matrix of class "dgCMatrix"
-L : 0 x 0 data.frame
-idsort : 0 x 0 matrix
-MATpsup : 0 x 0 sparse Matrix of class "dgCMatrix"
-idsup : length 0 vector
-Lsup : 0 x 0 data.frame
-colvect : 2 x 1 matrix
-BZL : 0 x 0 sparse Matrix of class "dgCMatrix"
-book : 5 x 6 ViSibook
-group : length 900 factor
-vect_tps : length 89 vector
-testsP : length 0 vector
-informers : 0 x 0 matrix
We realize that in general Participant 4 is spending less time in restaurant and at recreation than Participant 7, possibly due to his unstable financials status. This is more obvious in the difference in time spent for recreation where Participant 4 might want to restrict his non-necessity related spending in order to improve his financial outlook. This also aligns with the time spent at home where Participant * spent much more time at home than Participant 7. We also spot that both participants are having the same amount of time spent at work. As for transport perspective, Participant 4 is spending less time than Participant 7, potentially due to a closer location from home.
We have transformed the data accordingly and displayed a trellis plot partitioned by currentMode
qplot(year_month, hour, data = logs_two_p_count, color = participantId) +
facet_wrap(~ currentMode) +
labs(y= 'Hour', x= 'Time',
title = "Trend for Respective Activities Over Time") +
guides(color = guide_legend(title = "Participant Id")) +
theme(axis.title.y= element_text(angle=0), axis.ticks.x= element_blank(), axis.text.x=element_blank(),
axis.line= element_line(color= 'grey'))
We can see that despite a few of the irregular patterns, generally the time spent on each activity follow similar trends for both participant 4 and participant 7.
In order to add interactivity to the previous trellis plot, we will
use ggplotly() of plotly and
bscols() of crosstalk.
dd <- highlight_key(logs_two_p_count)
graph1 <- qplot(year_month, hour, data = logs_two_p_count, color = participantId) +
facet_wrap(~ currentMode) +
labs(y= 'Hour', x= 'Time',
title = "Trend for Respective Activities Over Time") +
guides(color = guide_legend(title = "Participant Id")) +
theme(axis.title.y= element_text(angle=0), axis.ticks.x= element_blank(), axis.text.x=element_blank(),
axis.line= element_line(color= 'grey'))
gg <- highlight(ggplotly(graph1),
"plotly_selected")
crosstalk::bscols(gg,
widths = c(12, 12),
DT::datatable(dd,
rownames = FALSE,
colnames = c('Year Month' = 'year_month',
'Participant Id' = 'participantId',
'Current Mode' = 'currentMode',
'Hour' = 'hour'),
filter = 'top',
class = 'display'))